How-To Tutorials

22 Nov 2013

15 min read

Unity Networking – The Pong Game

22 Nov 2013

(For more resources related to this topic, see here.) Multiplayer is everywhere. It's a staple of AAA games and small-budget indie offerings alike. Multiplayer games tap into our most basic human desires. Whether it be teaming up with strangers to survive a zombie apocalypse, or showing off your skills in a round of "Capture the Flag" on your favorite map, no artificial intelligence in the world comes close to the feeling of playing with a living, breathing, and thinking human being. Unity3D has a sizable number of third-party networking middleware aimed at developing multiplayer games, and is arguably one of the easiest platforms to prototype multiplayer games. The first networking system most people encounter in Unity is the built-in Unity Networking API . This API simplifies a great many tasks in writing networked code by providing a framework for networked objects rather than just sending messages. This works by providing a NetworkView component, which can serialize object state and call functions across the network. Additionally, Unity provides a Master server, which essentially lets players search among all public servers to find a game to join, and can also help players in connecting to each other from behind private networks. In this article, we will cover: Introducing multiplayer Introducing UDP communication Setting up your own Master server for testing What a NetworkView is Serializing object state Calling RPCs Starting servers and connecting to them Using the Master server API to register servers and browse available hosts Setting up a dedicated server model Loading networked levels Creating a Pong clone using Unity networking Introducing multiplayer games Before we get started on the details of communication over the Internet, what exactly does multiplayer entail in a game? As far as most players are concerned, in a multiplayer game they are sharing the same experience with other players. It looks and feels like they are playing the same game. In reality, they aren't. Each player is playing a separate game, each with its own game state. Trying to ensure that all players are playing the exact same game is prohibitively expensive. Instead, games attempt to synchronize just enough information to give the illusion of a shared experience. Games are almost ubiquitously built around a client-server architecture, where each client connects to a single server. The server is the main hub of the game, ideally the machine for processing the game state, although at the very least it can serve as a simple "middleman" for messages between clients. Each client represents an instance of the game running on a computer. In some cases the server might also have a client, for instance some games allow you to host a game without starting up an external server program. While an MMO ( Massively Multiplayer Online ) might directly connect to one of these servers, many games do not have prior knowledge of the server IPs. For example, FPS games often let players host their own servers. In order to show the user a list of servers they can connect to, games usually employ another server, known as the "Master Server" or alternatively the "Lobby server". This server's sole purpose is to keep track of game servers which are currently running, and report a list of these to clients. Game servers connect to the Master server in order to announce their presence publicly, and game clients query the Master server to get an updated list of game servers currently running. Alternatively, this Master server sometimes does not keep track of servers at all. Sometimes games employ "matchmaking", where players connect to the Lobby server and list their criteria for a game. The server places this player in a "bucket" based on their criteria, and whenever a bucket is full enough to start a game, a host is chosen from these players and that client starts up a server in the background, which the other players connect to. This way, the player does not have to browse servers manually and can instead simply tell the game what they want to play. Introducing UDP communication The built-in Unity networking is built upon RakNet . RakNet uses UDP communication for efficiency. UDP ( User Datagram Protocols ) is a simple way to send messages to another computer. These messages are largely unchecked, beyond a simple checksum to ensure that the message has not been corrupted. Because of this, messages are not guaranteed to arrive, nor are they guaranteed to only arrive once (occasionally a single message can be delivered twice or more), or even in any particular order. TCP, on the other hand, guarantees each message to be received just once, and in the exact order they were sent, although this can result in increased latency (messages must be resent several times if they fail to reach the target, and messages must be buffered when received, in order to be processed in the exact order they were sent). To solve this, a reliability layer must be built on top of UDP. This is known as rUDP ( reliable UDP ). Messages can be sent unreliably (they may not arrive, or may arrive more than once), or reliably (they are guaranteed to arrive, only once per message, and in the correct order). If a reliable message was not received or was corrupt, the original sender has to resend the message. Additionally, messages will be stored rather than immediately processed if they are not in order. For example, if you receive messages 1, 2, and 4, your program will not be able to handle those messages until message 3 arrives. Allowing unreliable or reliable switching on a per-message basis affords better overall performance. Messages, such as player position, are better suited to unreliable messages (if one fails to arrive, another one will arrive soon anyway), whereas damage messages must be reliable (you never want to accidentally drop a damage message, and having them arrive in the same order they were sent reduces race conditions). In Unity, you can serialize the state of an object (for example, you might serialize the position and health of a unit) either reliably or unreliably (unreliable is usually preferred). All other messages are sent reliably. Setting up the Master Server Although Unity provide their own default Master Server and Facilitator (which is connected automatically if you do not specify your own), it is not recommended to use this for production. We'll be using our own Master Server, so you know how to connect to one you've hosted yourself. Firstly, go to the following page: http://unity3d.com/master-server/ We're going to download two of the listed server components: the Master Server and the Facilitator as shown in the following screenshot: The servers are provided in full source, zipped. If you are on Windows using Visual Studio Express, open up the Visual Studio .sln solution and compile in the Release mode. Navigate to the Release folder and run the EXE (MasterServer.exe or Facilitator.exe). If you are on a Mac, you can either use the included XCode project, or simply run the Makefile (the Makefile works under both Linux and Mac OS X). The Master Server, as previously mentioned, enables our game to show a server lobby to players. The Facilitator is used to help clients connect to each other by performing an operation known as NAT punch-through . NAT is used when multiple computers are part of the same network, and all use the same public IP address. NAT will essentially translate public and private IPs, but in order for one machine to connect to another, NAT punch-through is necessary. You can read more about it here: http://www.raknet.net/raknet/manual/natpunchthrough.html The default port for the Master Server is 23466, and for the Facilitator is 50005. You'll need these later in order to configure Unity to connect to the local Master Server and Facilitator instead of the default Unity-hosted servers. Now that we've set up our own servers, let's take a look at the Unity Networking API itself. NetworkViews and state serialization In Unity, game objects that need to be networked have a NetworkView component. The NetworkView component handles communication over the network, and even helps make networked state serialization easier. It can automatically serialize the state of a Transform, Rigidbody, or Animation component, or in one of your own scripts you can write a custom serialization function. When attached to a game object, NetworkView will generate a NetworkViewID for NetworkView. This ID serves to uniquely identify a NetworkView across the network. An object can be saved as part of a scene with NetworkView attached (this can be used for game managers, chat boxes, and so on), or it can be saved in the project as a prefab and spawned later via Network.Instantiate (this is used to generate player objects, bullets, and so on). Network.Instantiate is the multiplayer equivalent to GameObject.Instantiate —it sends a message over the network to other clients so that all clients spawn the object. It also assigns a network ID to the object, which is used to identify the object across multiple clients (the same object will have the same network ID on every client). A prefab is a template for a game object (such as the player object). You can use the Instantiate methods to create a copy of the template in the scene. Spawned network game objects can also be destroyed via Network.Destroy. It is the multiplayer counterpart of GameObject.Destroy. It sends a message to all clients so that they all destroy the object. It also deletes any RPC messages associated with that object. NetworkView has a single component that it will serialize. This can be a Transform, a Rigidbody, an Animation, or one of your own components that has an OnSerializeNetworkView function. Serialized values can either be sent with the ReliableDeltaCompressed option, where values are always sent reliably and compressed to include only changes since the last update, or they can be sent with the Unreliable option, where values are not sent reliably and always include the full values (not the change since the last update, since that would be impossible to predict over UDP). Each method has its own advantages and disadvantages. If data is constantly changing, such as player position in a first person shooter, in general Unreliable is preferred to reduce latency. If data does not often change, use the ReliableDeltaCompressed option to reduce bandwidth (as only changes will be serialized). NetworkView can also call methods across the network via Remote Procedure Calls ( RPC ). RPCs are always completely reliable in Unity Networking, although some networking libraries allow you to send unreliable RPCs, such as uLink or TNet. Writing a custom state serializer While initially a game might simply serialize Transform or Rigidbody for testing, eventually it is often necessary to write a custom serialization function. This is a surprisingly easy task. Here is a script that sends an object's position over the network: using UnityEngine; using System.Collections; public class ExampleUnityNetworkSerializePosition : MonoBehaviour { public void OnSerializeNetworkView( BitStream stream, NetworkMessageInfo info ) { // we are currently writing information to the network if( stream.isWriting ) { // send the object's position Vector3 position = transform.position; stream.Serialize( ref position ); } // we are currently reading information from the network else { // read the first vector3 and store it in 'position' Vector3 position = Vector3.zero; stream.Serialize( ref position ); // set the object's position to the value we were sent transform.position = position; } } } Most of the work is done with BitStream. This is used to check if NetworkView is currently writing the state, or if it is reading the state from the network. Depending on whether it is reading or writing, stream.Serialize behaves differently. If NetworkView is writing, the value will be sent over the network. However, if NetworkView is reading, the value will be read from the network and saved in the referenced variable (thus the ref keyword, which passes Vector3 by reference rather than value). Using RPCs RPCs are useful for single, self-contained messages that need to be sent, such as a character firing a gun, or a player saying something in chat. In Unity, RPCs are methods marked with the [RPC] attribute. This can be called by name via networkView.RPC( "methodName", … ). For example, the following script prints to the console on all machines when the space key is pressed. using UnityEngine; using System.Collections; public class ExampleUnityNetworkCallRPC : MonoBehavior { void Update() { // important – make sure not to run if this networkView is notours if( !networkView.isMine ) return; // if space key is pressed, call RPC for everybody if( Input.GetKeyDown( KeyCode.Space ) ) networkView.RPC( "testRPC", RPCMode.All ); } [RPC] void testRPC( NetworkMessageInfo info ) { // log the IP address of the machine that called this RPC Debug.Log( "Test RPC called from " + info.sender.ipAddress ); } } Also note the use of NetworkView.isMine to determine ownership of an object. All scripts will run 100 percent of the time regardless of whether your machine owns the object or not, so you have to be careful to avoid letting some logic run on remote machines; for example, player input code should only run on the machine that owns the object. RPCs can either be sent to a number of players at once, or to a specific player. You can either pass an RPCMode to specify which group of players to receive the message, or a specific NetworkPlayer to send the message to. You can also specify any number of parameters to be passed to the RPC method. RPCMode includes the following entries: All (the RPC is called for everyone) AllBuffered (the RPC is called for everyone, and then buffered for when new players connect, until the object is destroyed) Others (the RPC is called for everyone except the sender) OthersBuffered (the RPC is called for everyone except the sender, and then buffered for when new players connect, until the object is destroyed) Server (the RPC is sent to the host machine) Initializing a server The first thing you will want to set up is hosting games and joining games. To initialize a server on the local machine, call Network.InitializeServer. This method takes three parameters: the number of allowed incoming connections, the port to listen on, and whether to use NAT punch-through. The following script initializes a server on port 25000 which allows 8 clients to connect: using UnityEngine; using System.Collections; public class ExampleUnityNetworkInitializeServer : MonoBehavior { void OnGUI() { if( GUILayout.Button( "Launch Server" ) ) { LaunchServer(); } } // launch the server void LaunchServer() { // Start a server that enables NAT punchthrough, // listens on port 25000, // and allows 8 clients to connect Network.InitializeServer( 8, 25005, true ); } // called when the server has been initialized void OnServerInitialized() { Debug.Log( "Server initialized" ); } } You can also optionally enable an incoming password (useful for private games) by setting Network.incomingPassword to a password string of the player's choice, and initializing a general-purpose security layer by calling Network.InitializeSecurity(). Both of these should be set up before actually initializing the server. Connecting to a server To connect to a server you know the IP address of, you can call Network.Connect. The following script allows the player to enter an IP, a port, and an optional password and attempts to connect to the server: using UnityEngine; using System.Collections; public class ExampleUnityNetworkingConnectToServer : MonoBehavior { private string ip = ""; private string port = ""; private string password = ""; void OnGUI() { GUILayout.Label( "IP Address" ); ip = GUILayout.TextField( ip, GUILayout.Width( 200f ) ); GUILayout.Label( "Port" ); port = GUILayout.TextField( port, GUILayout.Width( 50f ) ); GUILayout.Label( "Password (optional)" ); password = GUILayout.PasswordField( password, '*',GUILayout.Width( 200f ) ); if( GUILayout.Button( "Connect" ) ) { int portNum = 25005; // failed to parse port number – a more ideal solution is tolimit input to numbers only, a number of examples can befound on the Unity forums if( !int.TryParse( port, out portNum ) ) { Debug.LogWarning( "Given port is not a number" ); } // try to initiate a direct connection to the server else { Network.Connect( ip, portNum, password ); } } } void OnConnectedToServer() { Debug.Log( "Connected to server!" ); } void OnFailedToConnect( NetworkConnectionError error ) { Debug.Log( "Failed to connect to server: " +error.ToString() ); } } Connecting to the Master Server While we could just allow the player to enter IP addresses to connect to servers (and many games do, such as Minecraft), it's much more convenient to allow the player to browse a list of public servers. This is what the Master Server is for. Now that you can start up a server and connect to it, let's take a look at how to connect to the Master Server you downloaded earlier. First, make sure both the Master Server and Facilitator are running. I will assume you are running them on your local machine (IP is 127.0.0.1), but of course you can run these on a different computer and use that machine's IP address. Keep in mind, if you want the Master Server publicly accessible, it must be installed on a machine with a public IP address (it cannot be in a private network). Let's configure Unity to use our Master Server rather than the Unity-hosted test server. The following script configures the Master Server and Facilitator to connect to a given IP (by default 127.0.0.1): using UnityEngine; using System.Collections; public class ExampleUnityNetworkingConnectToMasterServer : MonoBehaviour { // Assuming Master Server and Facilitator are on the same machine public string MasterServerIP = "127.0.0.1"; void Awake() { // set the IP and port of the Master Server to connect to MasterServer.ipAddress = MasterServerIP; MasterServer.port = 23466; // set the IP and port of the Facilitator to connect to Network.natFacilitatorIP = MasterServerIP; Network.natFacilitatorPort = 50005; } }

0
0
13655

Packt

21 Nov 2013

5 min read

Platform as a Service

Packt

21 Nov 2013

5 min read

0
0
12208

article-image-drawing-and-drawables-android-canvas

Packt

21 Nov 2013

8 min read

Drawing and Drawables in Android Canvas

Packt

21 Nov 2013

8 min read

In this article by Mir Nauman Tahir, the author of the book Learning Android Canvas, our goal is to learn about the following: Drawing on a Canvas Drawing on a View Drawing on a SurfaceView Drawables Drawables from resource images Drawables from resource XML Shape Drawables (For more resources related to this topic, see here.) Android provides us with 2D drawing APIs that enable us to draw our custom drawing on the Canvas. When working with 2D drawings, we will either draw on view or directly on the surface or Canvas. Using View for our graphics, the drawing is handled by the system's normal View hierarchy drawing process. We only define our graphics to be inserted in the View; the rest is done automatically by the system. While using the method to draw directly on the Canvas, we have to manually call the suitable drawing Canvas methods such as onDraw() or createBitmap(). This method requires more efforts and coding and is a bit more complicated, but we have everything in control such as the animation and everything else like being in control of the size and location of the drawing and the colors and the ability to move the drawing from its current location to another location through code. The implementation of the onDraw() method can be seen in the drawing on the view section and the code for createBitmap() is shown in the Drawing on a Canvas section. We will use the drawing on the View method if we are dealing with static graphics–static graphics do not change dynamically during the execution of the application–or if we are dealing with graphics that are not resource hungry as we don't wish to put our application performance at stake. Drawing on a View can be used for designing eye-catching simple applications with static graphics and simple functionality–simple attractive backgrounds and buttons. It's perfectly okay to draw on View using the main UI thread as these graphics are not a threat to the overall performance of our application. The drawing on a Canvas method should be used when working with heavy graphics that change dynamically like those in games. In this scenario, the Canvas will continuously redraw itself to keep the graphics updated. We can draw on a Canvas using the main UI thread, but when working with heavy, resource-hungry, dynamically changing graphics, the application will continuously redraw itself. It is better to use a separate thread to draw these graphics. Keeping such graphics on the main UI thread will not make them go into the non-responding mode, and after working so hard we certainly won't like this. So this choice should be made very carefully. Drawing on a Canvas A Canvas is an interface, a medium that enables us to actually access the surface, which we will use to draw our graphics. The Canvas contains all the necessary drawing methods needed to draw our graphics. The actual internal mechanism of drawing on a Canvas is that, whenever anything needs to be drawn on the Canvas, it's actually drawn on an underlying blank bitmap image. By default, this bitmap is automatically provided for us. But if we want to use a new Canvas, then we need to create a new bitmap image and then a new Canvas object while providing the already created bitmap to the constructor of the Canvas class. A sample code is explained as follows. Initially, the bitmap is drawn but not on the screen; it's actually drawn in the background on an internal Canvas. But to bring it to the front, we need to create a new Canvas object and provide the already created bitmap to it to be painted on the screen. Bitmap ourNewBitmap = Bitmap.CreateBitmap(100,100,Bitmap.Config.ARGB_8888); Canvas ourNewCanvas = new Canvas(ourNewBitmap); Drawing on a View If our application does not require heavy system resources or fast frame rates, we should use View.onDraw(). The benefit in this case is that the system will automatically give the Canvas its underlying bitmap as well. All we need is to make our drawing calls and be done with our drawings. We will create our class by extending it from the View class and will define the onDraw() method in it. The onDraw() method is where we will define whatever we want to draw on our Canvas. The Android framework will call the onDraw() method to ask our View to draw itself. The onDraw() method will be called by the Android framework on a need basis; for example, whenever our application wants to draw itself, this method will be called. We have to call the invalidate() method whenever we want our view to redraw itself. This means that, whenever we want our application's view to be redrawn, we will call the invalidate() method and the Android framework will call the onDraw() method for us. Let's say we want to draw a line, then the code would be something like this: class DrawView extends View { Paint paint = new Paint(); public DrawView(Context context) { super(context); paint.setColor(Color.BLUE); } @Override public void onDraw(Canvas canvas) { super.onDraw(canvas); canvas.drawLine(10, 10, 90, 10, paint); } } Inside the onDraw() method, we will use all kinds of facilities that are provided by the Canvas class such as the different drawing methods made available by the Canvas class. We can also use drawing methods from other classes as well. The Android framework will draw a bitmap on the Canvas for us once our onDraw() method is complete with all our desired functionality. If we are using the main UI thread, we will call the invalidate() method, but if we are using another thread, then we will call the postInvalidate() method. Drawing on a SurfaceView The View class provides a subclass SurfaceView that provides a dedicated drawing surface within the hierarchy of the View. The goal is to draw using a secondary thread so that the application won't wait for the resources to be free and ready to redraw. The secondary thread has access to the SurfaceView object that has the ability to draw on its own Canvas with its own redraw frequency. We will start by creating a class that will extend the SurfaceView class. We should implement an interface SurfaceHolder.Callback. This interface is important in the sense that it will provide us with the information when a surface is created, modified, or destroyed. When we have timely information about the creation, change, or destruction of a surface, we can make a better decision on when to start drawing and when to stop. The secondary thread class that will perform all the drawing on our Canvas can also be defined in the SurfaceView class. To get information, the Surface object should be handled through SurfaceHolder and not directly. To do this, we will get the Holder by calling the getHolder() method when the SurfaceView is initialized. We will then tell the SurfaceHolder object that we want to receive all the callbacks; to do this, we will call addCallBacks(). After this, we will override all the methods inside the SurfaceView class to get our job done according to our functionality. The next step is to draw the surface's Canvas from inside the second thread; to do this, we will pass our SurfaceHandler object to the thread object and will get the Canvas using the lockCanvas() method. This will get the Canvas for us and will lock it for the drawing from the current thread only. We need to do this because we don't want an open Canvas that can be drawn by another thread; if this is the situation, it will disturb all our graphics and drawings on the Canvas. When we are done with drawing our graphics on the Canvas, we will unlock the Canvas by calling the unlockCanvasAndPost() method and will pass our Canvas object. To have a successful drawing, we will need repeated redraws; so we will repeat this locking and unlocking as needed and the surface will draw the Canvas. To have a uniform and smooth graphic animation, we need to have the previous state of the Canvas; so we will retrieve the Canvas from the SurfaceHolder object every time and the whole surface should be redrawn each time. If we don't do so, for instance, not painting the whole surface, the drawing from the previous Canvas will persist and that will destroy the whole look of our graphic-intense application. A sample code would be the following: class OurGameView extends SurfaceView implements SurfaceHolder.Callback { Thread thread = null; SurfaceHolder surfaceHolder; volatile boolean running = false; public void OurGameView (Context context) { super(context); surfaceHolder = getHolder(); } public void onResumeOurGameView (){ running = true; thread = new Thread(this); thread.start(); } public void onPauseOurGameView(){ boolean retry = true; running = false; while(retry){ thread.join(); retry = false; } public void run() { while(running){ if(surfaceHolder.getSurface().isValid()){ Canvas canvas = surfaceHolder.lockCanvas(); //... actual drawing on canvas surfaceHolder.unlockCanvasAndPost(canvas); } } } }

0
0
11690

How-To Tutorials

Packt

21 Nov 2013

9 min read

Security considerations

Packt

21 Nov 2013

9 min read

(For more resources related to this topic, see here.) Security considerations One general piece of advice that applies to every type of application development is to develop the software with security in mind, meaning it is more expensive for an error-prone application to first implement the needed features and after that to make modifications in them to enforce security. Instead, this should be done simultaneously. In this article we are raising security awareness, and next we will learn about which measures we can apply and what we can do in order to have more secure applications. Use TLS TLS (the cryptographic protocol named Transport Layer Security) is the result of the standardization of the SSL protocol (Version 3.0), which was developed by Netscape and was proprietary. Thus, in various documents and specifications, we can find the use of TLS and SSL interchangeably, even though there are actually differences in the protocol. From a security standpoint, it is recommended that all requests sent from the client during the execution of a grant flow are done over TLS. In fact, it is recommended TLS be used on both sides of the connection. OAuth 2.0 relies heavily on TLS; this is done in order to maintain confidentiality of the exchanged data over the network by providing encryption and integrity on top of the connection between the client and server. In retrospect, in OAuth 1.0 the use of TLS was not mandatory, and parts of the authorization flow (on both server side and client side) had to deal with cryptography, which resulted in various implementations, some good and some sloppy. When we make an HTTP request (for example, in order to execute some OAuth 2.0 grant flow), in order to make the connection secure the HTTP client library that is used to execute the request has to be configured to use TLS. TLS is to be used by the client application when sending requests to both authorization and resource servers, and is to be used by the servers themselves as well. The result is an end-to-end TLS protected connection. If end-to-end protection cannot be established, it is advised to reduce the scope and lifetime of the access tokens that are issued by the authorization server. The OAuth2.0 specification states that the use of TLS is mandatory when sending requests to the authorization and token endpoints and when sending requests using password authentication. Access tokens, refresh tokens, username and password combinations, and client credentials must be transmitted with the use of TLS. By using TLS, the attackers that are trying to intercept/eavesdrop the exchanged information during the execution of the grant flow will not be able to do so. If TLS is not used, attackers can eavesdrop on an access token, an authorization code, a username and password combination, or other critical information. This means that the use of TLS prevents man-in-the-middle attacks and replaying of already fulfilled requests (also called replay attacks). By performing replay attempts, the attackers can issue themselves new access tokens or can perform replays on a request towards resource servers and modify or delete data belonging to the resource owner. Last but not least, the authorization server can enforce the use of TLS on every endpoint in order to reduce the risk of phishing attacks. Ensure web server application protection For client applications that are actually web applications deployed on a server, there are numerous protection measures that can be taken into account so that the server, the database, and the configuration files are kept safe. The list is not limited and can vary between scenarios and environments; some of the key measures are as follows: Install recommended security additions and tools for the given web and database servers that are in use. Restrict remote administrator access only to the people that require it (for example, for server maintenance and application monitoring). Regulate which server user can have which roles, and regulate permissions for the resources available to them. Disable or remove unnecessary services on the server. Regulate the database connections so that they are only available to the client application. Close unnecessary open ports on the server; leaving them open can give an advantage to the attacker. Configure protection against SQL injection. Configure database and file encryption for vital information stored (credentials and so on). Avoid storing credentials in plain text format. Keep the software components that are in use updated in order to avoid security exploitation. Avoid security misconfiguration. It is important to have in mind what kind of web server it is, which database is used, which modules the client application uses, and on which services the client application depends, so that we can research how to apply the security measures appropriately. OWASP (Open Web Application Security Project) provides additional documentation on security measures and describes the industry's best practices regarding software security. It is an additional resource recommended for reference and research on this topic, and can be found at https://www.owasp.org. Ensure mobile and desktop application protection Mobile and desktop applications can be installed on devices and machines that can be part of internal/enterprise or external environments. They are more vulnerable compared to the applications deployed on regulated server environments. Attackers have a better chance to try to extract the source code from the applications and other data that comes with them. In order to provide the best possible security, some of the key measures are as follows: Use secure storage mechanisms provided by additional programming libraries and by features offered by the operating system for which the application is developed. In multiuser operating systems, store user specific data such as credentials or access and refresh tokens in locations that are not available to other users on the same system. As mentioned previously, credentials should not be stored in plain text format and should be encrypted. If using an embedded database (such as SQLite in most cases), try to enforce security measures against SQL injection and encrypt the vital information (or encrypt the whole embedded database). For mobile devices, advise the end user to utilize device lock (usually with a PIN, password, or face unlock). Implement an optional PIN or password lock on the application level that the end user can activate if desired (which can also serve as an alternative to the previous locking measure). Sanitize and validate the value from any input fields that are used in the applications, in order to avoid code injection, which can lead to changing the behavior or exposing data stored by the client application. When the application is ready to be packaged for production use (to be used by end users), perform code analysis for obfuscating code and removing the unused code. This will produce a smaller client application in file size, which will perform the same but it will be harder to reverse engineer. As usual, for additional reference and research we can refer to the OAuth2.0 threat model RFC document, to OWASP, and to security documentation specific to the programming language, tools, libraries, and operating system that the client application is built for. Utilize the state parameter As mentioned, with this parameter the state between the request and the callback is maintained. Even if it is an optional parameter it is highly advisable to use, and the value from the callback response will be validated if it is equal to the one that was sent. When setting the value for the state parameter in the request Don't use predictable values that can be guessed by attackers. Don't repeat the same value often between requests. Don't use values that can contain and expose some internal business logic of the system and can be used maliciously if discovered. Use session values: If the user agent—with which the user has authenticated and approved the authorization request—has its session cookie available, calculate a hash from it and use that one as the state value. Or use some string generator: If a session variable is not available as an alternative, we can use some generated programmable value. Some real world implementations do this by generating unique identifiers and using them as state values, commonly achieved by generating a random UUID (universally unique identifier) and converting it to a hexadecimal value. Keep track of which state value was set for which request (user session in most cases) and redirect URI, in order to validate that the returned one contains an equal value. Use refresh tokens when available For client applications that have obtained an access token and a refresh token along with it, upon access token expiry it is a good practice to request a new one by using the refresh token instead of going through the whole grant flow again. With this measure we are transmitting less data over the network and are providing less exposure that the attacker can monitor. Request the needed scope only As briefly mentioned previously in this article, it is highly advisable to specify only the required scope when requesting an access token instead of specifying the maximum one that is available. With this measure, if an attacker gets hold of the access token, he can take damaging actions only to the level specified by the scope, and not more. This is done for damage minimization until the token is revoked and invalidated. Summary In this article we learned what data is to be protected, what features OAuth 2.0 contains regarding information security, and which precautions we should take into consideration. Resources for Article: Further resources on this subject: Deploying a Vert.x application [Article] Building tiny Web-applications in Ruby using Sinatra [Article] Fine Tune the View layer of your Fusion Web Application [Article]

0
0
1387

Packt

21 Nov 2013

5 min read

Viewing on Mobile Devices

Packt

21 Nov 2013

5 min read

(For more resources related to this topic, see here.) Axure 7 makes it easy to get our work on real devices so we can test out ideas on a device in the lab or field, and so we can let stakeholders experience the design on the intended device with their own hands. Hosting prototypes To let users access our prototype on their own devices or the one we control for demonstration or usability testing, we are going to need to find a way to host it in an available web server. Fortunately, because our prototype is a collection of web pages and related files, we can use any web server that can be accessed by the device in question. AxShare One of the easiest ways to get our prototype on a device and in the hands of users is to use the free AxShare service as a hosting environment. To publish to AxShare, you first need to set up an AxShare account. To do this, go to http://share.axure.com and you'll be able to create a free account that allows for the hosting of 10 prototypes. If you need to host more prototypes or want a custom domain, there is also an AxShare subscription product. Once the account is created, publishing a prototype to AxShare just requires a click on the AxShare button in the top toolbar pane of the prototype editor. When we publish to AxShare, we see the dialog box as shown in the following screenshot. We use it to sign in to our AxShare account and can save the sign in credentials. In this dialog box, we can choose to create and upload a new prototype or update an existing one, and decide if we want to use password protection to restrict access to our work. When a new prototype is uploaded to AxShare, it will automatically create a prototype ID that allows people to access it using a mobile web browser. Remember that Axure ultimately creates HTML files, even when we are simulating fullscreen native apps. We'll see the dialog box shown in the following screenshot when publishing, which confirms we are uploading our files and provides us with the URL we can use to get the prototype on a mobile browser. The random characters at the end of the URL are the prototype ID that AxShare created. One thing to note is that the URL provided in the dialog box links to the main prototype page, which includes the left panel used to navigate pages or do variable debugging. When sending a prototype to a mobile device, we are going to want to get the exact URL of our home screen and not the main prototype file. This means the URL for a prototype at http://share.axure.com/C1374Q/ will need to be updated to include a specific screen URL so it ends up being more like http://share.axure.com/C1374Q/publishing.html. We can get this URL by closing the left navigation panel after selecting our start screen, and then copying the URL from the desktop browser address bar. And with this URL the person accessing it on a phone is getting our intended start page. One thing to be aware of when hosting prototypes on AxShare or any cloud-based solution is that there may be network latency just as there is with real mobile websites due to the speed of the network being used and general network traffic. If we need the prototype to behave more real like a native app in which much of the user interface and some data is already on the device, we should consider using HTML 5 device caching. Home screen icons and splash pages If we want our app to look like it is running in fullscreen mode like a native app, we can add a home screen icon, an iOS splash screen, and hide the browser's navigation. To do this, add the home screen and splash page PNG images at the sizes specified in the Mobile/Device panel of the Generate Prototype dialog box. We also need to select the checkboxes for the Hide address bar and Hide browser nav (when launched from iOS home screen) options. This panel is also used to generate the HTML viewport meta tag and instruct the mobile browser to hide the address bar and browser navigation in iOS. We can also set the iOS status bar to use the default appearance, a black background, or a black translucent background. For each prototype, we'll want to experiment with these settings to see which works best for our project. We can add the prototype to the home screen using native browser and OS functionality (this differs between iOS and Android, and even between different versions of Android). On an iPhone 5 which is running on iOS 7, our PNG image appears as an iPhone home screen as shown in the following screenshot: The splash screen will load as a fullscreen image prior to our main prototype page loading, so users see a fullscreen image as seen in the following screenshot, when they first tap the home screen icon to open our prototype. This is one way to make our prototype feel more like a native app. Summary In this article we saw how prototypes can be tested or demonstrated on mobile devices using tools, such as AxShare. We also saw how to create home screen icons and splash pages. Resources for Article: Further resources on this subject: Creating and configuring a basic mobile application [Article] Common design patterns and how to prototype them [Article] Creating mobile friendly themes [Article]

0
0
2635

How-To Tutorials

article-image-exploring-model-view-controller

Packt

21 Nov 2013

5 min read

Exploring Model View Controller

Packt

21 Nov 2013

5 min read

(For more resources related to this topic, see here.) Many applications start from something small, such as several hundred lines of code prototype of a toy application written in one evening. When you add new features and the application code clutters, it becomes much harder to understand how it works and to modify it, especially for a newcomer. The Model-View-Controller (MVC) pattern serves as the basis for software architecture that will be easily maintained and modified. The main idea of MVC is about separating an application into three parts: model, view, and controller. There is an easy way to understand MVC—the model is the data and its business logic, the view is the window on the screen, and the controller is the glue between the two. While the view and controller depend on the model, the model is independent of the presentation or the controller. This is a key feature of the division. It allows you to work with the model, and hence, the business logic of the application, regardless of the visual presentation. The following diagram shows the flow of interaction between the user, controller, model, and view. Here, a user makes a request to the application and the controller does the initial processing. After that it manipulates the model, creating, updating, or deleting some data there. The model returns some result to the controller, that passes the result to view, which renders data to the user. The MVC pattern gained wide popularity in web development. Many Python web frameworks, such as web2py, Pyramid, Django (uses a flavor of MVC called MVP), Giotto, and Kiss use it. Let's review key components of the MVC pattern in more detail. Model – the knowledge of the application The model is a cornerstone of the application because, while the view and controller depend on the model, the model is independent of the presentation or the controller. The model provides knowledge: data, and how to work with that data. The model has a state and methods for changing its state but does not contain information on how this knowledge can be visualized. This independence makes working independently, covering the model with tests and substituting the controllers/views without changing the business logic of an application. The model is responsible for maintaining the integrity of the program's data, because if that gets corrupted then it's game over for everyone. The following are recommendations for working with models: Strive to perform the following for models: Create data models and interface of work with them Validate data and report all errors to the controller Avoid working directly with the user interface View – the appearance of knowledge View receives data from the model through the controller and is responsible for its visualization. It should not contain complex logic; all such logic should go to the models and controllers. If you need to change the method of visualization, for example, if you need your web application to be rendered differently depending on whether the user is using a mobile phone or desktop browser, you can change the view accordingly. This can include HTML, XML, console views, and so on. The recommendation for working with views are as follows: Strive to perform the following for views: Try to keep them simple; use only simple comparisons and loops Avoid doing the following in views: Accessing the database directly Using any logic other than loops and conditional statements (if-then-else) because the separation of concerns requires all such complex logic to be performed in models Controller – the glue between the model and view The direct responsibility of the controllers is to receive data from the request and send it to other parts of the system. Only in this case, the controller is "thin" and is intended only as a bridge (glue layer) between the individual components of the system. Let's look at the following recommendations for working with controllers: Strive to perform the following in controllers: Pass data from user requests to the model for processing, retrieving and saving the data Pass data to views for rendering Handle all request errors and errors from models Avoid the following in controllers: Render data Work with the database and business logic directly Thus, in one statement: We need smart models, thin controllers, and dumb views. Benefits of using the MVC MVC brings a lot of positive attributes to your software, including the following: Decomposition allows you to logically split the application into three relatively independent parts with loose coupling and will decrease its complexity. Developers typically specialize in one area, for example, a developer might create a user interface or modify the business logic. Thus, it's possible to limit their area of responsibility to only some part of code. MVC makes it possible to change visualization, thus modifying the view without changes in the business logic. MVC makes it possible to change business logic, thus modifying the model without changes in visualization. MVC makes it possible to change the response to a user action (clicking on the button with the mouse, data entry) without changing the implementation of views; it is sufficient to use a different controller. Summary It is important to separate the areas of responsibility to maintain loose coupling and for the maintainability of the software. MVC divides the application into three relatively independent parts: model, view, and controller. The model is all about knowledge, data, and business logic. The view is about presentation to the end users, and it's important to keep it simple. The controller is the glue between the model and the view, and it's important to keep it thin. Resources for Article: Further resources on this subject: Getting Started with Spring Python [Article] Python Testing: Installing the Robot Framework [Article] Getting Up and Running with MySQL for Python [Article]

0
0
10414

Packt

21 Nov 2013

10 min read

Our First Machine Learning Method – Linear Classification

Packt

21 Nov 2013

10 min read

(For more resources related to this topic, see here.) To get a grip on the problem of machine learning in scikit-learn, we will start with a very simple machine learning problem: we will try to predict the Iris flower species using only two attributes: sepal width and sepal length. This is an instance of a classification problem, where we want to assign a label (a value taken from a discrete set) to an item according to its features. Let's first build our training dataset—a subset of the original sample, represented by the two attributes we selected and their respective target values. After importing the dataset, we will randomly select about 75 percent of the instances, and reserve the remaining ones (the evaluation dataset) for evaluation purposes (we will see later why we should always do that): >>> from sklearn.cross_validation import train_test_split >>> from sklearn import preprocessing >>> # Get dataset with only the first two attributes >>> X, y = X_iris[:, :2], y_iris >>> # Split the dataset into a training and a testing set >>> # Test set will be the 25% taken randomly >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33) >>> print X_train.shape, y_train.shape (112, 2) (112,) >>> # Standardize the features >>> scaler = preprocessing.StandardScaler().fit(X_train) >>> X_train = scaler.transform(X_train) >>> X_test = scaler.transform(X_test) The train_test_split function automatically builds the training and evaluation datasets, randomly selecting the samples. Why not just select the first 112 examples? This is because it could happen that the instance ordering within the sample could matter and that the first instances could be different to the last ones. In fact, if you look at the Iris datasets, the instances are ordered by their target class, and this implies that the proportion of 0 and 1 classes will be higher in the new training set, compared with that of the original dataset. We always want our training data to be a representative sample of the population they represent. The last three lines of the previous code modify the training set in a process usually called feature scaling. For each feature, calculate the average, subtract the mean value from the feature value, and divide the result by their standard deviation. After scaling, each feature will have a zero average, with a standard deviation of one. This standardization of values (which does not change their distribution, as you could verify by plotting the X values before and after scaling) is a common requirement of machine learning methods, to avoid that features with large values may weight too much on the final results. Now, let's take a look at how our training instances are distributed in the two-dimensional space generated by the learning feature. pyplot, from the matplotlib library, will help us with this: >>> import matplotlib.pyplot as plt >>> colors = ['red', 'greenyellow', 'blue'] >>> for i in xrange(len(colors)): >>> xs = X_train[:, 0][y_train == i] >>> ys = X_train[:, 1][y_train == i] >>> plt.scatter(xs, ys, c=colors[i]) >>> plt.legend(iris.target_names) >>> plt.xlabel('Sepal length') >>> plt.ylabel('Sepal width') The scatter function simply plots the first feature value (sepal width) for each instance versus its second feature value (sepal length) and uses the target class values to assign a different color for each class. This way, we can have a pretty good idea of how these attributes contribute to determine the target class. The following screenshot shows the resulting plot: Looking at the preceding screenshot, we can see that the separation between the red dots (corresponding to the Iris setosa) and green and blue dots (corresponding to the two other Iris species) is quite clear, while separating green from blue dots seems a very difficult task, given the two features available. This is a very common scenario: one of the first questions we want to answer in a machine learning task is if the feature set we are using is actually useful for the task we are solving, or if we need to add new attributes or change our method. Given the available data, let's, for a moment, redefine our learning task: suppose we aim, given an Iris flower instance, to predict if it is a setosa or not. We have converted our problem into a binary classification task (that is, we only have two possible target classes). If we look at the picture, it seems that we could draw a straight line that correctly separates both the sets (perhaps with the exception of one or two dots, which could lie in the incorrect side of the line). This is exactly what our first classification method, linear classification models, tries to do: build a line (or, more generally, a hyperplane in the feature space) that best separates both the target classes, and use it as a decision boundary (that is, the class membership depends on what side of the hyperplane the instance is). To implement linear classification, we will use the SGDClassifier from scikit-learn. SGD stands for Stochastic Gradient Descent, a very popular numerical procedure to find the local minimum of a function (in this case, the loss function, which measures how far every instance is from our boundary). The algorithm will learn the coefficients of the hyperplane by minimizing the loss function. To use any method in scikit-learn, we must first create the corresponding classifier object, initialize its parameters, and train the model that better fits the training data. You will see while you advance that this procedure will be pretty much the same for what initially seemed very different tasks. >>> from sklearn.linear_modelsklearn._model import SGDClassifier >>> clf = SGDClassifier() >>> clf.fit(X_train, y_train)</p></pre> The SGDClassifier initialization function allows several parameters. For the moment, we will use the default values, but keep in mind that these parameters could be very important, especially when you face more real-world tasks, where the number of instances (or even the number of attributes) could be very large. The fit function is probably the most important one in scikit-learn. It receives the training data and the training classes, and builds the classifier. Every supervised learning method in scikit-learn implements this function. What does the classifier look like in our linear model method? As we have already said, every future classification decision depends just on a hyperplane. That hyperplane is, then, our model. The coef_ attribute of the clf object (consider, for the moment, only the first row of the matrices), now has the coefficients of the linear boundary and the intercept_ attribute, the point of intersection of the line with the y axis. Let's print them: >>> print clf.coef_ [[-28.53692691 15.05517618] [ -8.93789454 -8.13185613] [ 14.02830747 -12.80739966]] >>> print clf.intercept_ [-17.62477802 -2.35658325 -9.7570213 ] Indeed in the real plane, with these three values, we can draw a line, represented by the following equation: -17.62477802 - 28.53692691 * x1 + 15.05517618 * x2 = 0 Now, given x1 and x2 (our real-valued features), we just have to compute the value of the left-side of the equation: if its value is greater than zero, then the point is above the decision boundary (the red side), otherwise it will be beneath the line (the green or blue side). Our prediction algorithm will simply check this and predict the corresponding class for any new iris flower. But, why does our coefficient matrix have three rows? Because we did not tell the method that we have changed our problem definition (how could we have done this?), and it is facing a three-class problem, not a binary decision problem. What, in this case, the classifier does is the same we did—it converts the problem into three binary classification problems in a one-versus-all setting (it proposes three lines that separate a class from the rest). The following code draws the three decision boundaries and lets us know if they worked as expected: >>> x_min, x_max = X_train[:, 0].min() - .5, X_train[:, 0].max() + .5 >>> y_min, y_max = X_train[:, 1].min() - .5, X_train[:, 1].max() + .5 >>> xs = np.arange(x_min, x_max, 0.5) >>> fig, axes = plt.subplots(1, 3) >>> fig.set_size_inches(10, 6) >>> for i in [0, 1, 2]: >>> axes[i].set_aspect('equal') >>> axes[i].set_title('Class '+ str(i) + ' versus the rest') >>> axes[i].set_xlabel('Sepal length') >>> axes[i].set_ylabel('Sepal width') >>> axes[i].set_xlim(x_min, x_max) >>> axes[i].set_ylim(y_min, y_max) >>> sca(axes[i]) >>> plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.prism) >>> ys = (-clf.intercept_[i] – Xs * clf.coef_[i, 0]) / clf.coef_[i, 1] >>> plt.plot(xs, ys, hold=True) The first plot shows the model built for our original binary problem. It looks like the line separates quite well the Iris setosa from the rest. For the other two tasks, as we expected, there are several points that lie on the wrong side of the hyperplane. Now, the end of the story: suppose that we have a new flower with a sepal width of 4.7 and a sepal length of 3.1, and we want to predict its class. We just have to apply our brand new classifier to it (after normalizing!). The predict method takes an array of instances (in this case, with just one element) and returns a list of predicted classes: >>>print clf.predict(scaler.transform([[4.7, 3.1]])) [0] If our classifier is right, this Iris flower is a setosa. Probably, you have noticed that we are predicting a class from the possible three classes but that linear models are essentially binary: something is missing. You are right. Our prediction procedure combines the result of the three binary classifiers and selects the class in which it is more confident. In this case, we will select the boundary line whose distance to the instance is longer. We can check that using the classifier decision_function method: >>>print clf.decision_function(scaler.transform([[4.7, 3.1]])) [[ 19.73905808 8.13288449 -28.63499119]] Summary In this article we included a very simple example of classification, trying to show the main steps for learning. Resources for Article: Further resources on this subject: Python Testing: Installing the Robot Framework [Article] Inheritance in Python [Article] Python 3: Object-Oriented Design [Article]

0
0
15384

article-image-developing-apps-google-speech-apis

Packt

21 Nov 2013

6 min read

Developing apps with the Google Speech APIs

Packt

21 Nov 2013

6 min read

(For more resources related to this topic, see here.) Speech technology has come of age. If you own a smartphone or tablet you can perform many tasks on your device using voice. For example, you can send a text message, update your calendar, set an alarm, and do many other things with a single spoken command that would take multiple steps to complete using traditional methods such as tapping and selecting. You can also ask the sorts of queries that you would previously have typed into your Google search box and get a spoken response. For those who wish to develop their own speech-based apps, Google provides APIs for the basic technologies of text-to-speech synthesis (TTS) and automated speech recognition (ASR). Using these APIs developers can create useful interactive voice applications. This article provides a brief overview of the Google APIs and then goes on to show some examples of voice-based apps built around the APIs. Using the Google text-to-speech synthesis API TTS has been available on Android devices since Android 1.6 (API Level 4). The components of the Google TTS API (package android.speech.tts) are documented at http://developer.android.com/reference/android/speech/tts/package-summary.html. Interfaces and classes are listed here and further details can be obtained by clicking on these. Starting the TTS engine involves creating an instance of the TextToSpeech class along with the method that will be executed when the TTS engine is initialized. Checking that TTS has been initialized is done through an interface called OnInitListener. If TTS initialization is complete, the method onInit is invoked. If TTS has been initialized correctly, the speak method is invoked to speak out some words: TextToSpeech tts = new TextToSpeech(this, new OnInitListener(){ public void onInit(int status) { if (status == TextToSpeech.SUCCESS) speak(“Hello world”, TextToSpeech.QUEUE_ADD, null); } } Due to limited storage on some devices, not all languages that are supported may actually be installed on a particular device so it is important to check if a particular language is available before creating the TextToSpeech object. This way, it is possible to download and install the required language-specific resource files if necessary. This is done by sending an Intent with the action ACTION_CHECK_TTS_DATA, which is part of the TextToSpeech.Engine class: Intent intent = new In-tent(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA); startActivityForResult(intent,TTS_DATA_CHECK); If the language data has been correctly installed, the onActivityResult handler will receive a CHECK_VOICE_DATA_PASS. If the data is not available, the action ACTION_INSTALL_TTS_DATA will be executed: Intent installData = new Intent (Engine. ACTION_INSTALL_TTS_DATA); startActivity(installData); The next figure shows an example of an app using the TTS API. A potential use-case for this type of app is when the user accesses some text on the Web - for example, a news item, email, or a sports report. This is useful if the user’s eyes and hands are busy, or if there are problems reading the text on the screen. In this example, the app retrieves some text and the user presses the Speak button to hear it. A Stop button is provided in case the user does not wish to hear all of the text. Using the Google speech recognition API The components of the Google Speech API (package android.speech) are documented at http://developer.android.com/reference/android/speech/package-summary.html. Interfaces and classes are listed and further details can be obtained by clicking on these. There are two ways in which speech recognition can be carried out on an Android Device: based solely on a RecognizerIntent, or by creating an instance of SpeechRecognizer. The following code shows how to start an activity to recognize speech using the first approach: Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); // Specify language model intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, languageModel); // Specify how many results to receive. Results listed in order of confidence intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, numberRecoResults); // Start listening startActivityForResult(intent, ASR_CODE); The app shown below illustrates the following: The user selects the parameters for speech recognition. The user presses a button and says some words. The words recognized are displayed in a list along with their confidence scores. Multilingual apps It is important to be able to develop apps in languages other than English. The TTS and ASR engines can be configured to a wide range of languages. However, we cannot expect that all languages will be available or that they are supported on a particular device. Thus, before selecting a language it is necessary to check whether it is one of the supported languages, and if not, to set the currently preferred language. In order to do this, a RecognizerIntent.ACTION_GET_LANGUAGE_DETAILS ordered broadcast is sent that returns a Bundle from which the information about the preferred language (RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE) and the list of supported languages (RecognizerIntent.EXTRA_SUPPORTED_LANGUAGES) can be extracted. For speech recognition this introduces a new parameter for the intent in which the language is specified that will be used for recognition, as shown in the following code line: intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, language); As shown in the next figure, the user is asked for a language and then the app recognizes what the user says and plays back the best recognition result in the language selected. Creating a Virtual Personal Assistant (VPA) Many voice-based apps need to do more than simply speak and understand speech. For example, a VPA also needs the ability to engage in dialog with the user and to perform operations such as connecting to web services and activating device functions. One way to enable these additional capabilities is to make use of chatbot technology (see, for example, the Pandorabots web service: http://www.pandorabots.com/). The following figure shows two VPAs, Jack and Derek, that have been developed in this way. Jack is a general-purpose VPA, while Derek is a specialized VPA that can answer questions about Type 2 diabetes, such as symptoms, causes, treatment, risks to children, and complications. Summary The Google Speech APIs can be used in countless ways to develop interesting and useful voice-based apps. This article has shown some examples. By building on these you will be able to bring the power of voice to your Android apps, making them smarter and more intuitive, and boosting your users' mobile experience. Resources for Article: Further resources on this subject: Introducing an Android platform [Article] Building Android (Must know) [Article] Top 5 Must-have Android Applications [Article]

0
0
11558

How-To Tutorials

article-image-introducing-salesforce-chatter

Packt

21 Nov 2013

5 min read

Introducing Salesforce Chatter

Packt

21 Nov 2013

5 min read

(For more resources related to this topic, see here.) An overview of cloud computing Cloud computing is a subscription-based service that provides us with computing resources and networked storage space. It allows you to access your information anytime and from anywhere. The only requirement is that one must have an Internet connection. That's all. If you have a cloud-based setup, there is no need to maintain the server in the future. We can think of cloud computing as similar to our e-mail account. Think of your accounts such as Gmail, Hotmail, and so on. We just need a web browser and an Internet connection to access our information. We do not need separate software to access our e-mail account; it is different from the text editor installed on our computer. There is no need of physically moving storage and information; everything is up and running over there and not at our end. It is the same with cloud; we choose what has to be stored and accessed on cloud. You also don't have to pay an employee or contractor to maintain the server since it is based on the cloud. While traditional technologies and computer setup require you to be physically present at the same place to access information, cloud removes this barrier and allows us to access information from anywhere. Cloud computing helps businesses to perform better by allowing employees to work from remote locations (anywhere on the globe). It provides mobile access to information and ﬂexibility to the working of a business organization. Depending on your needs, we can subscribe to the following type of clouds: Public cloud: This cloud can be accessed by subscribers who have an Internet connection and access to cloud storage Private cloud: This is accessed by a limited group of people or members of an organization Community cloud: This is a cloud that is shared between two or more organizations that have similar requirements Hybrid cloud: This is a combination of at least two clouds, where the clouds are a combination of public, private, or community Depending on your need, you have the ability to subscribe to a specific cloud provider. Cloud providers follow the pay-as-you-go method. It means that, if your technological needs change, you can purchase more and continue working on cloud. You do not have to worry about the storage configuration and management of servers because everything is done by your cloud provider. An overview of salesforce.com Salesforce.com is the leader in pay-as-you-go enterprise cloud computing. It specializes in CRM software products for sales and customer services and supplies products for building and running business apps. Salesforce has recently developed a social networking product called Chatter for its business apps. With the concept of no software or hardware required, we are up and running and seeing immediate positive effects on our business. It is a platform for creating and deploying apps for social enterprise. This does not require us to buy or manage servers, software, or hardware. Here you can focus fully on building apps that include mobile functionality, business processes, reporting, and search. All apps run on secure servers and proven services that scale, tune, and back up data automatically. Collaboration in the past Collaboration always plays a key role to improve business outcomes; it is a crucial necessity in any professional business. The central meaning of communication has changed over time. With changes in people's individual living situations as well as advancements in technology, how one communicates with the rest of the world has been altered. A century or two ago, people could communicate using smoke signals, carrier pigeons and drum beats, or speak to one another, that is, face-to-face communication. As the world and technology developed, we found that we could send longer messages from long distances with ease. This has caused a decline in face-to-face-interaction and a substantial growth in communication via technology. The old way of face-to-face interaction impacted the business process as there was a gap between the collaboration of the client, company, or employees situated in distant places. So it reduced the profit, ROI, as well as customer satisfaction. In the past, there was no faster way available for communication, so collaboration was a time-consuming task for business; its effect was the loss of client retention. Imagine a situation where a sales representative is near to closing a deal, but the decision maker is out of the office. In the past, there was no fast/direct way to communicate. Sometimes this lack of efficient communication impacted the business negatively, in addition to the loss of potential opportunities. Summary In this article we learned cloud computing and Salesforce.com, and discussed about collaboration in the new era by comparing it to the ancient age. We discussed and introduced Salesforce Chatter and its effect on ROI (Return of Investment). Resources for Article: Further resources on this subject: Salesforce CRM Functions [Article] Configuration in Salesforce CRM [Article] Django 1.2 E-commerce: Data Integration [Article]

0
0
5775

article-image-making-goods-manufacturing-resource-planning

Packt

21 Nov 2013

4 min read

Making Goods with Manufacturing Resource Planning

Packt

21 Nov 2013

4 min read

0
0
1781

How-To Tutorials

Packt

21 Nov 2013

7 min read

Zurb Foundation – an Overview

Packt

21 Nov 2013

7 min read

(For more resources related to this topic, see here.) Most importantly, you can apply your creativity to make the design your own. Foundation gives you the tools you need for this. Then it gets out of the way and your site becomes your own. Especially when you advance to using the Foundation's SASS variables, functions and mixins, you have the ability to make your site your own unique creation. Foundation's grid system The foundation (pun intended) of Zurb Foundation is its grid system—rows and columns—much like a spread sheet, a blank sheet of graph paper, or tables, similar to what we used to use for HTML layout. Think of it as the canvas upon which you design your website. Each cell is a content area that can be merged with other cells, beside or below it, to make larger content areas. A default installation of Foundation will be based on twelve cells in a row. A column is comprised of one or more individual cells. Lay out a website Let's put Foundation's grid system to work in an example. We'll build a basic website with a two part header, a two part content area, a sidebar, and a three part footer area. With the simple techniques we demonstrate here, you can craft mostly any layout you want. Here is the mobile view Foundation works best when you design for small devices first, so here is what we want our small device (mobile) view to look like: This is the layout we want on mobile or small devices. But we've labeled the content areas with a title that describes where we want them on a regular desktop. By doing this, we are thinking ahead and creating a view ready for the desktop as well. Here is the desktop view Since a desktop display is typically wider than a mobile display, we have more horizontal space and things that had to be presented vertically on the mobile view can be displayed horizontally on the desktop view. Here is how we want our regular desktop or laptop to display the same content areas: These are not necessarily drawn to scale. It is the layout we are interested in. The two part header went from being one above the other in the mobile view to being side-by-side in the desktop view. The header on the top went left and the bottom header went right. All these make perfect sense. However, the sidebar shifted from being above the content area in the mobile view and shifted to its right in the mobile view. That's not natural when rendering HTML. Something must have happened! The content areas, left and right, stayed the same in both the views. And that's exactly what we wanted. The three part footer got rearranged. The center footer appears to have slid down between the left and right footers. That makes sense from a design perspective but it isn't natural from an HTML rendering perspective. Foundation provides the classes to easily make all this magic happen. Here is the code Unlike the early days of mobile design where a separate website was built for mobile devices, with Foundation you build your site once, and use classes to specify how it should look on both mobile and regular displays. Here is the HTML code that generates the two layouts: <header class="row"> <div class="large-6 column">Header Left</div> <div class="large-6 column">Header Right</div> </header> <main class="row"> <aside class="large-3 push-9 column">Sidebar Right</aside> <section class="large-9 pull-3 columns"> <article class="row"> <div class="small-9 column">Content Left</div> <div class="small-3 column">Content Right</div> </article> </section> </main> <footer class="row"> <div class="small-6 small-centered large-4 large-uncentered push-4 column">Footer Center</div> <div class="small-6 large-4 pull-4 column">Footer Left</div> <div class="small-6 large-4 column">Footer Right</div> </footer> That's all there is to it. Replace the text we used for labels with real content and you have a design that displays on mobile and regular displays in the layouts we've shown in this article. Toss in some widgets What we've shown above is just the core of the Foundation framework. As a toolkit, it also includes numerous CSS components and JavaScript plugins. Foundation includes styles for labels, lists, and data tables. It has several navigation components including Breadcrumbs, Pagination, Side Nav, and Sub Nav. You can add regular buttons, drop-down buttons, and button groups. You can make unique content areas with Block Grids, a special variation of the underlying grid. You can add images as thumbnails, put content into panels, present your video feed using the Flex Video component, easily add pricing tables, and represent progress bars. All these components only require CSS and are the easiest to integrate. By tossing in Foundation's JavaScript plugins, you have even more capabilities. Plugins include things like Alerts, Tooltips, and Dropdowns. These can be used to pop up messages in various ways. The Section plugin is very powerful when you want to organize your content into horizontal or vertical tabs, or when you want horizontal or vertical navigation. Like most components and plugins, it understands the mobile and regular desktop views and adapts accordingly. The Top Bar plugin is a favorite for many developers. It is a multi-level fly out menu plugin. Build your menu in HTML the way Top Bar expects. Set it up with the appropriate classes and it just works. Magellan and Joyride are two plugins that you can put to work to help show your viewers where they are on a page or to help them navigate to various sections on a page. Orbit is Foundation's slide presentation plugin. You often see sliders on the home page of websites these days. Clearing is similar to Orbit except that it displays thumbnails of the images in a presentation below the main display window. A viewer clicks on a thumbnail to display the full image. Reveal is a plugin that allows you to put a link anywhere on your page and when the viewer clicks on it, a box pops up extra content, which could even be an Orbit slider, is revealed. Interchange is one of the most recent additions to Foundation's plugin factory. With it you can selectively load images depending on the target environment. This lets you optimize bandwidth between your web server and your viewer's browser. Foundation also provides a great Forms plugin. On its own it is capable. With the additional Abide plugin you have a great deal of control over form layout and editing. Summary As you can see, Foundation is very capable of laying out web page for mobile devices and regular displays. One set of code, two very different looks. And that's just the beginning. Foundation's CSS components and JavaScript plugins can be placed on a web page in almost any content area. With these widgets you can have much more interaction with your viewers than you otherwise would. Put Foundation to work in your website today! Resources for Article: Further resources on this subject: Quick start – using Foundation 4 components for your first website [Article] Introduction to RWD frameworks [Article] Nesting, Extend, Placeholders, and Mixins [Article]

0
0
12509

article-image-basic-concepts-and-architecture-cassandra

Packt

21 Nov 2013

7 min read

Basic Concepts and Architecture of Cassandra

Packt

21 Nov 2013

7 min read

(For more resources related to this topic, see here.) CAP theorem If you want to understand Cassandra, you first need to understand the CAP theorem. The CAP theorem (published by Eric Brewer at the University of California, Berkeley) basically states that it is impossible for a distributed system to provide you with all of the following three guarantees: Consistency: Updates to the state of the system are seen by all the clients simultaneously Availability: Guarantee of the system to be available for every valid request Partition tolerance: The system continues to operate despite arbitrary message loss or network partition Cassandra provides users with stronger availability and partition tolerance with tunable consistency tradeoff; the client, while writing to and/or reading from Cassandra, can pass a consistency level that drives the consistency requirements for the requested operations. BigTable / Log-structured data model In a BigTable data model, the primary key and column names are mapped with their respective bytes of value to form a multidimensional map. Each table has multiple dimensions. Timestamp is one such dimension that allows the table to version the data and is also used for internal garbage collection (of deleted data). The next figure shows the data structure in a visual context; the row key serves as the identifier of the column that follows it, and the column name and value are stored in contiguous blocks: It is important to note that every row has the column names stored along with the values, allowing the schema to be dynamic. Column families Columns are grouped into sets called column families, which can be addressed through a row key (primary key). All the data stored in a column family is of the same type. A column family must be created before any data can be stored; any column key within the family can be used. It is our intent that the number of distinct column families in a keyspace should be small, and that the families should rarely change during an operation. In contrast, a column family may have an unbounded number of columns. Both disk and memory accounting are performed at the column family level. Keyspace A keyspace is a group of column families; replication strategies and ACLs are performed at the keyspace level. If you are familiar with traditional RDBMS, you can consider the keyspace as an alternative name for the schema and the column family as an alternative name for tables. Sorted String Table (SSTable) An SSTable provides a persistent file format for Cassandra; it is an ordered immutable storage structure from rows of columns (name/value pairs). Operations are provided to look up the value associated with a specific key and to iterate over all the column names and value pairs within a specified key range. Internally, each SSTable contains a sequence of row keys and a set of column key/value pairs. There is an index and the start location of the row key in the index file, which is stored separately. The index summary is loaded into the memory when the SSTable is opened in order to optimize the amount of memory needed for the index. A lookup for actual rows can be performed with a single disk seek and by scanning sequentially for the data. Memtable A memtable is a memory location where data is written to during update or delete operations. A memtable is a temporary location and will be flushed to the disk once it is full to form an SSTable. Basically, an update or a write operation to Cassandra is a sequential write to the commit log in the disk and a memory update; hence, writes are as fast as writing to memory. Once the memtables are full, they are flushed to the disk, forming new SSTables: Reads in Cassandra will merge the data from different SSTables and the data in memtables. Reads should always be requested with a row key (primary key) with the exception of a key range scan. When multiple updates are applied to the same column, Cassandra uses client-provided timestamps to resolve conflicts. Delete operations to a column work a little differently; because SSTables are immutable, Cassandra writes the tombstone to avoid random writes. A tombstone is a special value written to Cassandra instead of removing the data immediately. The tombstone can then be sent to nodes that did not get the initial remove request, and can be removed during GC. Compaction To bound the number of SSTable files that must be consulted on reads and to reclaim the space taken by unused data, Cassandra performs compactions. In a nutshell, compaction compacts n (the configurable number of SSTables) into one big SSTable. They start out being the same size as the memtables. Therefore, the sizes of the SSTables are exponentially bigger when they grow older. Partitioning and replication Dynamo style As mentioned previously, the partitioner and replication scheme is motivated by the Dynamo paper; let's talk about it in detail. Gossip protocol Cassandra is a peer-to-peer system with no single point of failure; the cluster topology information is communicated via the Gossip protocol. The Gossip protocol is similar to real-world gossip, where a node (say B) tells a few of its peers in the cluster what it knows about the state of a node (say A). Those nodes tell a few other nodes about A, and over a period of time, all the nodes know about A. Distributed hash table The key feature of Cassandra is the ability to scale incrementally. This includes the ability to dynamically partition the data over a set of nodes in the cluster. Cassandra partitions data across the cluster using consistent hashing and randomly distributes the rows over the network using the hash of the row key. When a node joins the ring, it is assigned a token that advocates where the node has to be placed in the ring: Now consider a case where the replication factor is 3; clients randomly write or read from a coordinator (every node in the system can act as a coordinator and a data node) in the cluster. The node calculates a hash of the row key and provides the coordinator enough information to write to the right node in the ring. The coordinator also looks at the replication factor and writes to the neighboring nodes in the ring order. Eventual consistency Given a sufficient period of time over which no changes are sent, all updates can be expected to propagate through the system and the replicas created will be consistent. Cassandra supports both the eventual consistency model and strong consistency model, which can be controlled from the client while performing an operation. Cassandra supports various consistency levels while writing or reading data. The consistency level drives the number data replicas the coordinator has to contact to get the data before acknowledging the clients. If W + R > Replication Factor, where W is the number of nodes to block on write and R the number to block on reads, the clients will see a strong consistency behavior: ONE: R/W at least one node TWO: R/W at least two nodes QUORUM: R/W from at least floor (N/2) + 1, where N is the replication factor When nodes are down for maintenance, Cassandra will store hints for updates performed on that node, which can be delivered back when the node is available in the future. To make data consistent, Cassandra relies on hinted handoffs, read repairs, and anti-entropy repairs. Summary In this article, we have discussed basic concepts and basic building blocks, including the motivation in building a new datastore solution. Resources for Article: Further resources on this subject: Apache Cassandra: Libraries and Applications [Article] About Cassandra [Article] Quick start – Creating your first Java application [Article]

0
0
6301

Packt

20 Nov 2013

6 min read

Securing the Hadoop Ecosystem

Packt

20 Nov 2013

6 min read

(For more resources related to this topic, see here.) Each ecosystem component has its own security challenges and needs to be configured uniquely based on its architecture to secure them. Each of these ecosystem components has end users directly accessing the component or a backend service accessing the Hadoop core components (HDFS and MapReduce). The following are the topics that we'll be covering in this article: Configuring authentication and authorization for the following Hadoop ecosystem components: Hive Oozie Flume HBase Sqoop Pig Best practices in configuring secured Hadoop components Configuring Kerberos for Hadoop ecosystem components The Hadoop ecosystem is growing continuously and maturing with increasing enterprise adoption. In this section, we look at some of the most important Hadoop ecosystem components, their architecture, and how they can be secured. Securing Hive Hive provides the ability to run SQL queries over the data stored in the HDFS. Hive provides the Hive query engine that converts Hive queries provided by the user to a pipeline of MapReduce jobs that are submitted to Hadoop (JobTracker or ResourceManager) for execution. The results of the MapReduce executions are then presented back to the user or stored in HDFS. The following figure shows a high-level interaction of a business user working with Hive to run Hive queries on Hadoop: There are multiple ways a Hadoop user can interact with Hive and run Hive queries; these are as follows: The user can directly run the Hive queries using Command Line Interface (CLI). The CLI connects to the Hive metastore using the metastore server and invokes Hive query engine directly to execute Hive query on the cluster. Custom applications written in Java and other languages interacts with Hive using the HiveServer. HiveServer, internally, uses the metastore server and the Hive Query Engine to execute the Hive query on the cluster. To secure Hive in the Hadoop ecosystem, the following interactions should be secured: User interaction with Hive CLI or HiveServer User roles and privileges needs to be enforced to ensure users have access to only authorized data The interaction between Hive and Hadoop (JobTracker or ResourceManager) has to be secured and the user roles and privileges should be propagated to Hadoop jobs To ensure secure Hive user interaction, there is a need to ensure that the user is authenticated by HiveServer or CLI before running any jobs on the cluster. The user has to first use the kinit command to fetch the Kerberos ticket. This ticket is stored in the credential cache and used to authenticate with Kerberos-enabled systems. Once the user is authenticated, Hive submits the job to Hadoop (JobTracker or ResourceManager). Hive needs to impersonate the user to execute MapReduce on the cluster. From Hive Version 0.11 onwards, HiveServer2 was introduced. The earlier HiveServer had serious security limitations related to user authentication. HiveServer2 supports Kerberos and LDAP authentication for the user authentication. When HiveServer2 is configured to have LDAP authentication, Hive users are managed using the LDAP store. Hive asks the users to submit the MapReduce jobs to Hadoop. Thus, if we configure HiveServer2 to use LDAP, only the user authentication between the client and HiveServer2 is addressed. The interaction of Hive with Hadoop is insecure, and Hive MapReduce will be able to access other users' data in the Hadoop cluster. On the other hand, when we use Kerberos authentication for Hive users with HiveServer2, the same user is impersonated to execute MapReduce on the Hadoop cluster. So it is recommended that in a production environment, we configure HiveServer2 with Kerberos to have a seamless authentication and access control for the users submitting Hive queries. The credential store for Kerberos KDC can be configured to be LDAP so that we can centrally manage the user credentials of the end users. To set up a secured Hive interactions, we need to do the following steps: One of the key steps in securing Hive interaction is to ensure that the Hive user is impersonated in Hadoop, as Hive executes a MapReduce job on the Hadoop cluster. To achieve this goal, we need to add the hive.server2.enable.impersonation configuration in hive-site.xml, and hadoop.proxyuser.hive.hosts and hadoop. proxyuser.hive.groups in core-site.xml. <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@YOUR-REALM.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/hive/conf/hive.keytab</value> </property> <property> <name>hive.server2.enable.impersonation</name> <description>Enable user impersonation for HiveServer2</description> <value>true</value> </property> Securing Hive using Sentry In the previous section, we saw how Hive authentication can be enforced using Kerberos and the user privileges that are enforced by using user impersonation in Hadoop by the superuser. Sentryis the one of the latest entrant in the Hadoop ecosystem that provides finegrained user authorization for the data that is stored in Hive. Sentry provides finegrained, role-based authorization to Hive and Impala. Sentry uses HiveServer2 and metastore server to execute the queries on the Hadoop platform. However, the user impersonation is turned off in HiveServer2 when Sentry is used. Sentry enforces user privileges on the Hadoop data using the Hive metastore. Sentry supports authorization policies per database/schema. This could be leveraged to enforce user management policies. More details on Sentry are available at the following URL: http://www.cloudera.com/content/cloudera/en/products/cdh/sentry.html Summary In this article we learned how to configure Kerberos for Hadoop ecosystem components. We also looked at how to secure Hive using Sentry. Resources for Article: Further resources on this subject: Advanced Hadoop MapReduce Administration [Article] Managing a Hadoop Cluster [Article] Making Big Data Work for Hadoop and Solr [Article]

0
0
3877

Packt

20 Nov 2013

11 min read

Setting Up NameNode HA

Packt

20 Nov 2013

11 min read

(For more resources related to this topic, see here.) We will configure our NameNode HA setup by adding several options to the core-site.xml file. The following is the structure of the file for this particular step. It will give you an idea of the XML structure, if you are not familiar with it. The header comments are stripped out: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://sample-cluster/</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>nn1.hadoop.test.com:2181,nn2.hadoop.test.com:2181,jt1.hadoop. test.com:2181 </value> </property> </configuration> The configuration file format is pretty much self-explanatory; variables are surrounded by the <property> tag, and each variable has a name and a value. There are only two variables that we need to add at this stage. fs.default.name is the logical name of the NameNode cluster. The value hdfs://sample-cluster/ is specific to the HA setup. This is the logical name of the NameNode cluster. We will define the servers that comprise of it in the hdfs-site.xml file. In a non-HA setup, this variable is assigned a host and a port of the NameNode, since there is only one NameNode in the cluster. The ha.zookeeper.quorum variable specifies locations and ports of the ZooKeeper servers. The ZooKeeper cluster can be used by other services, such as HBase, that is why it is defined in core-site.xml. The next step is to configure the hdfs-site.xml file and add all HDFS-specific parameters there. I will omit the <property> tag and only include <name> and <value> to make the list less verbose. <name>dfs.name.dir</name> <value>/dfs/nn/</value> NameNode will use the location specified by the dfs.name.dir variable to store the persistent snapshot of HDFS metadata. This is where the fsimage file will be stored. As discussed previously, the volume on which this directory resides needs to be backed by RAID. Losing this volume means losing NameNode completely. The /dfs/nn path is an example, however you are free to choose your own. You can actually specify several paths with a dfs.name.dir value, separating them by commas. NameNode will mirror the metadata files in each directory specified. If you have a shared network storage available, you can use it as one of the destinations for HDFS metadata. This will provide additional offsite backups. <name>dfs.nameservices</name> <value>sample-cluster</value> The dfs.nameservices variable specifies the logical name of the NameNode cluster and should be replaced by something that makes sense to you, such as prod-cluster or stage-cluster. The value of dfs.nameservices must match the value of fs.default.name from the core-site.xml file. <name>dfs.ha.namenodes.sample-cluster</name> <value>nn1,nn2</value> Here, we specify the NameNodes that make up our HA cluster setup. These are logical names, not real server hostnames or IPs. These logical names will be referenced in other configuration variables. <name>dfs.namenode.rpc-address.sample-cluster.nn1</name> <value>nn1.hadoop.test.com:8020</value> <name>dfs.namenode.rpc-address.sample-cluster.nn2</name> <value>nn2.hadoop.test.com:8020</value> This pair of variables provide mapping from logical names like nn1 and nn2 to the real host and port value. By default, NameNode daemons use port 8020 for communication with clients and each other. Make sure this port is open for the cluster nodes. <name>dfs.namenode.http-address.sample-cluster.nn1</name> <value>nn1.hadoop.test.com:50070</value> <name>dfs.namenode.http-address.sample-cluster.nn2</name> <value>nn2.hadoop.test.com:50070</value> Each NameNode daemon runs a built-in HTTP server, which will be used by the NameNode web interface to expose various metrics and status information about HDFS operations. Additionally, standby NameNode uses HTTP calls to periodically copy the fsimage file from the primary server, perform the checkpoint operation, and ship it back. <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://nn1.hadoop.test.com:8485;nn2.hadoop.test.com:8485; jt1.hadoop.test.com:8485/sample-cluster</value> The dfs.namenode.shared.edits.dir variable specifies the setup of the JournalNode cluster. In our configuration, there are three JournalNodes running on nn1, nn2, and nn3. Both primary and standby nodes will use this variable to identify which hosts they should contact to send or receive new changes from editlog. <name>dfs.journalnode.edits.dir</name> <value>/dfs/journal</value> JournalNodes need to persist editlog changes that are being submitted to them by the active NameNode. The dfs.journalnode.edits.dir variable specifies the location on the local filesystem where editlog changes will be stored. Keep in mind that this path must exist on all JournalNodes and the ownership of all directories must be set to hdfs:hdfs (user and group). <name>dfs.client.failover.proxy.provider.sample-cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha. ConfiguredFailoverProxyProvider</value> In an HA setup, clients that access HDFS need to know which NameNode to contact for their requests. The dfs.client.failover.proxy.provider.sample-cluster variable specifies the Java class name, which will be used by clients for determining the active NameNode. At the moment, there is only ConfiguredFailoverProxyProvider available. <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> The dfs.ha.automatic-failover.enabled variable indicates if the NameNode cluster will use a manual or automatic failover. <name>dfs.ha.fencing.methods</name> <value>sshfence shell(/bin/true) </value> Orchestrating failover in a cluster setup is a complicated task involving multiple steps. One of the common problems that is not unique to the Hadoop cluster, but affects any distributed systems, is a "split-brain" scenario. Split-brain is a case where two NameNodes decide they both play an active role and start writing changes to the editlog. To prevent such an issue from occurring, the HA configuration maintains a marker in ZooKeeper, clearly stating which NameNode is active, and JournalNodes accepts writes only from that node. To be absolutely sure that the two NameNodes don't become active at the same time, a technique called fencing is used during failover. The idea is to force the shutdown of the active NameNode before transferring the active state to a standby. There are two fencing methods currently available: sshfence and shell. sshfence. These require a passwordless ssh access as a user that starts the NameNode daemon, from the active NameNode to the standby and vice versa. By default, this is the hdfs user. The fencing process checks if there is anyone listening on a NameNode port using the nc command, and if the port is found busy, it tries to kill the NameNode process. Another option for dfs.ha.fencing.methods is shell. This will execute the specified shell script to perform fencing. It is important to understand that failover will fail if fencing fails. In our case, we specified two options, the second one always returns success. This is done for workaround cases where the primary NameNode machine goes down and the ssh method will fail, and no failover will be performed. We want to avoid this, so the second option would be to failover anyway, even without fencing, which, as already mentioned, is safe with our setup. To achieve this, we specify two fencing methods, which will be tried by ZKFC in the order of: if the first one fails, the second one will be tried. In our case, the second one will always return success and failover will be initiated, even if the server running the primary NameNode is not available via ssh. <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value> The last option we will need to configure for NameNode HA setup is the ssh key, which will be used by sshfence. Make sure you change the ownership for this file to hdfs user. Two keys need to be generated, one for the primary and one for the secondary NameNode. It is a good idea to test ssh access as an hdfs user in both directions to make sure it is working fine. The hdfs-site.xml configuration file is now all set for testing the HA setup. Don't forget to sync these configuration files to all the nodes in the cluster. The next thing that needs to be done is to start JournalNodes. Execute this command on nn1, nn2, and jt1 a root user: # service hadoop-hdfs-journalnode start With CDH, it is recommended to always use the service command instead of calling scripts in /etc/init.d/ directly. This is done to guarantee that all environment variables are set up properly before the daemon is started. Always check the logfiles for daemons. Now, we need to initially format HDFS. For this, run the following command on nn1: # sudo -u hdfs hdfs namenode –format This is the initial setup of the NameNode, so we don't have to worry about affecting any HDFS metadata, but be careful with this command, because it will destroy any previous metadata entries. There is no strict requirement to run format command on nn1, but to make it easier to follow, let's assume we want nn1 to become an active NameNode. Format command will also format the storage for the JournalNodes. The next step is to create an entry for the HA cluster in ZooKeeper, and start NameNode and ZKFC on the first NameNode. In our case, this is nn1: # sudo -u hdfs hdfs zkfc -formatZK # service hadoop-hdfs-namenode start # service hadoop-hdfs-zkfc start Check the ZKFC log file (by default, it is in /var/log/hadoop-hdfs/) to make sure nn1 is now an active NameNode: INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at nn1.hadoop.test.com/192.168.0.100:8020 active... INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at nn1.hadoop.test.com/192.168.0.100:8020 to active state To activate the secondary NameNode, an operation called bootstrapping needs to be performed. To do this, execute the following command on nn2: # sudo -u hdfs hdfs namenode –bootstrapStandby This will pull the current filesystem state from active NameNode and synchronize the secondary NameNode with the JournalNodes Quorum. Now, you are ready to start the NameNode daemon and the ZKFC daemon on nn2. Use the same commands that you used for nn1. Check the ZKFC log file to make sure nn2 successfully acquired the secondary NameNode role. You should see the following messages at the end of the logfile: INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at nn2.hadoop.test.com/192.168.0.101:8020 should become standby INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at nn2.hadoop.test.com/192.168.0.101:8020 to standby state This is the last step in configuring NameNode HA. It is a good idea to verify if automatic failover is configured correctly, and if it will behave as expected in the case of a primary NameNode outage. Testing failover in the cluster setup stage is easier and safer than discovering that failover doesn't work during production stage and causing a cluster outage. You can perform a simple test: kill the primary NameNode daemon and verify if the secondary takes over its role. After that, bring the old primary back online and make sure it takes over the secondary role. You can use execute the following command to get the current status of NameNode nn1: # sudo -u hdfs hdfs haadmin -getServiceState nn1 The hdfs haadmin command can also be used to initiate a failover in manual failover setup. At this point, you have a fully configured and functional NameNode HA setup. Summary We saw in this article how to configure Hadoop's NameNode HA. Resources for Article: Further resources on this subject: Advanced Hadoop MapReduce Administration [Article] Managing a Hadoop Cluster [Article] Making Big Data Work for Hadoop and Solr [Article]

0
0
5573

article-image-overview-process-management-microsoft-visio-2013

Packt

20 Nov 2013

6 min read

Overview of Process Management in Microsoft Visio 2013

Packt

20 Nov 2013

6 min read

(For more resources related to this topic, see here.) When Visio was first conceived of over 20 years ago, its first stated marketing aim was to outsell ABC Flowcharter, the best-selling process diagramming tool at the time. Therefore, Visio had to have all of the features from the start that are core in the creation of flowcharts, namely the ability to connect one shape to another and to have the lines route themselves around shapes. Visio soon achieved its aim, and looked for other targets to reach. So, process flow diagrams have long been a cornerstone of Visio's popularity and appeal and, although there have been some usability improvements over the years, there have been few enhancements to turn the diagrams into models that can be managed efficiently. Microsoft Visio 2010 saw the introduction of two features, structured diagrams and validation rules, that make process management achievable and customizable, and Microsoft Visio 2013 sees these features enhanced. In this article, you will be introduced to the new features that have been added to Microsoft Visio to support structured diagrams and validation. You will see where Visio fits in the Process Management stack, and explore the relevant out of the box content. Exploring the new process management features in Visio 2013 Firstly, Microsoft Visio 2010 introduced a new Validation API for structured diagrams and provided several examples of this in use, for example with the BPMN (Business Process Modeling Notation) Diagram and Microsoft SharePoint Workflow templates and the improvements to the Basic Flowchart and Cross-Functional Flowchart templates, all of which are found in the Flowchart category. Microsoft Visio 2013 has updated the version of BPMN from 1.1 to 2.0, and has introduced a new SharePoint 2013 Workflow template, in addition to the 2010 one. Templates in Visio consist of a predefined Visio document that has one or more pages, and may have a series of docked stencils (usually positioned on the left-hand side of workspace area). The template document may have an associated list of add-ons that are active while it is in use, and, with Visio 2013 Professional edition, an associated list of structured diagram validation rulesets as well. Most of the templates that contain validation rules in Visio 2013 are in the Flowchart category, as seen in the following screenshot, with the exception being the Six Sigma template in the Business category. Secondly, the concept of a Subprocess was introduced in Visio 2010. This enables processes to hyperlink to other pages describing the subprocesses in the same document, or even across documents. This latter point is necessary if subprocesses are stored in a document library, such as Microsoft SharePoint. The following screenshot illustrates how an existing subprocess can be associated with a shape in a larger process, selecting an existing shape in the diagram, before selecting the existing page that it links to from the drop-down menu on the Link to Existing button. In addition, a subprocess page can be created from an existing shape, or a selection of shapes, in which case they will be moved to the newly-created page. There were also a number of ease-of-use features introduced in Microsoft Visio 2010 to assist in the creation and revision of process flow diagrams. These include: Easy auto-connection of shapes Aligning and spacing of shapes Insertion and deletion of connected shapes Improved cross-functional flowcharts Subprocesses An infinite page option, so you need not go over the edge of the paper ever again Microsoft Visio 2013 has added two more notable features: Commenting (a replacement for the old reviewer's comments) Co-authoring However, this book is not about teaching the user how to use these features, since there will be many other authors willing to show you how to perform tasks that only need to be explained once. This book is about understanding the Validation API in particular, so that you can create, or amend, the rules to match the business logic that your business requires. Reviewing Visio Process Management capabilities Microsoft Visio now sits at the top of the Microsoft Process Management Product Stack, providing a Business Process Analysis (BPA) or Business Process Modeling (BPM) tool for business analysts, process owners/participants, and line of business software architects/developers. Understanding the Visio BMP Maturity Model If we look at the Visio BPM Maturity Model that Microsoft has previously presented to its partners, then we can see that Visio 2013 has filled some of the gaps that were still there after Visio 2010. However, we can also see that there are plenty of opportunities for partners to provide solutions on top of the Visio platform. The maturity model shows how Visio initially provided the means to capture paper-drawn business processes into electronic format, and included the ability to encapsulate data into each shape and infer the relationship and order between elements through connectors. Visio 2007 Professional added the ability to easily link shapes, which represent processes, tasks, decisions, gateways, and so on with a data source. Along with that, data graphics were provided to enable shape data to be displayed simply as icons, data bars, text, or to be colored by value. This enriched the user experience and provided quicker visual representation of data, thus increasing the comprehension of the data in the diagrams. Generic templates for specific types of business modeling were provided. Visio had a built-in report writer for many versions, which provided the ability to export to Excel or XML, but Visio 2010 Premium introduced the concept of validation and structured diagrams, which meant that the information could be verified before exporting. Some templates for specific types of business modeling were provided. Visio 2010 Premium also saw the introduction of Visio Services on SharePoint that provided the automatic (without involving the Visio client) refreshing of data graphics that were linked to specific types of data sources. Throughout this book we will be going into detail about Level 5 (Validation) in Visio 2013, because it is important to understand the core capabilities provided in Visio 2013. We will then be able to take the opportunity to provide custom Business Rule Modeling and Visualization. Reviewing the foundations of structured diagramming A structured diagram is a set of logical relationships between items, where these relationships provide visual organization or describe special interaction behaviors between them. The Microsoft Visio team analyzed the requirements for adding structure to diagrams and came up with a number of features that needed to be added to the Visio product to achieve this: Container Management: The ability to add labeled boxes around shapes to visually organize them Callout Management: The ability to associate callouts with shapes to display notes List Management: To provide order to shapes within a container Validation API: The ability to test the business logic of a diagram Connectivity API: The ability to create, remove, or traverse connections easily The following diagram demonstrates the use of Containers and Callouts in the construction of a basic flowchart, that has been validated using the Validation API, which in turn uses the Connectivity API.

0
0
9330

Unity Networking – The Pong Game

Platform as a Service

Drawing and Drawables in Android Canvas

Security considerations

Viewing on Mobile Devices

Exploring Model View Controller

Our First Machine Learning Method – Linear Classification

Developing apps with the Google Speech APIs

Introducing Salesforce Chatter

Making Goods with Manufacturing Resource Planning

Trending Topics

Zurb Foundation – an Overview

Basic Concepts and Architecture of Cassandra

Securing the Hadoop Ecosystem

Setting Up NameNode HA

Overview of Process Management in Microsoft Visio 2013

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access